Training Parsers on Incompatible Treebanks

نویسنده

  • Richard Johansson
چکیده

We consider the problem of training a statistical parser in the situation when there are multiple treebanks available, and these treebanks are annotated according to different linguistic conventions. To address this problem, we present two simple adaptation methods: the first method is based on the idea of using a shared feature representation when parsing multiple treebanks, and the second method on guided parsing where the output of one parser provides features for a second one. To evaluate and analyze the adaptation methods, we train parsers on treebank pairs in four languages: German, Swedish, Italian, and English. We see significant improvements for all eight treebanks when training on the full training sets. However, the clearest benefits are seen when we consider smaller training sets. Our experiments were carried out with unlabeled dependency parsers, but the methods can easily be generalized to other featurebased parsers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

One model, two languages: training bilingual parsers with harmonized treebanks

We introduce an approach to train lexicalized parsers using bilingual corpora obtained by merging harmonized treebanks of different languages, producing parsers that can analyze sentences in either of the learned languages, or even sentences that mix both. We test the approach on the Universal Dependency Treebanks, training with MaltParser and MaltOptimizer. The results show that these bilingua...

متن کامل

Recompiling a knowledge-based dependency parser into memory

Data-driven parsers tend to be trained on manually annotated treebanks. In this paper we describe two memory-based dependency parsers trained on treebanks that are automatically parsed by a knowledge-based parser for Dutch. When compared to training on a manual treebank of Dutch, the memory-based parsers exhibit virtually the same performance at the same amount of training material, and achieve...

متن کامل

Old School vs. New School: Comparing Transition-Based Parsers with and without Neural Network Enhancement

In this paper, we attempt a comparison between "new school" transitionbased parsers that use neural networks and their classical "old school" counterpart. We carry out experiments on treebanks from the Universal Dependencies project. To facilitate the comparison and analysis of results, we only work on a subset of those treebanks. However, we carefully select this subset in the hope to have res...

متن کامل

Don't Stop Me Now! Using Global Dynamic Oracles to Correct Training Biases of Transition-Based Dependency Parsers

This paper formalizes a sound extension of dynamic oracles to global training, in the frame of transition-based dependency parsers. By dispensing with the precomputation of references, this extension widens the training strategies that can be entertained for such parsers; we show this by revisiting two standard training procedures, early-update and max-violation, to correct some of their search...

متن کامل

Cross parser evaluation : a French Treebanks study

This paper presents preliminary investigations on the statistical parsing of French by bringing a complete evaluation on French data of the main probabilistic lexicalized and unlexicalized parsers first designed on the Penn Treebank. We adapted the parsers on the two existing treebanks of French (Abeillé et al., 2003; Schluter and van Genabith, 2007). To our knowledge, mostly all of the results...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013